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Pure methods generally perform excellently in either recommendation accuracy or diversity, 
whereas hybrid methods generally outperform pure cases in both recommendation accuracy and 
diversity, but encounter the dilemma of optimal hybridization parameter selection for different rec- 
ommendation focuses. In this article, based on a user-item bipartite network, we propose a data 
characteristic based algorithm, by relating the hybridization parameter to the data characteristic. 
Different from previous hybrid methods, the present algorithm adaptively assign the optimal pa- 
rameter specifically for each individual items according to the correlation between the algorithm 
and the item degrees. Compared with a highly accurate pure method, and a hybrid method which 
is outstanding in both the recommendation accuracy and the diversity, our method shows a re- 
markably promotional effect on the long-standing challenging problem of the cold start, as well as 
the recommendation diversity, while simultaneously keeps a high overall recommendation accuracy. 
Even compared with an improved hybrid method which is highly efficient on the cold start problem, 
the proposed method not only further improves the recommendation accuracy of the cold items, 
but also enhances the recommendation diversity. Our work might provide a promising way to bet- 
ter solving the personal recommendation from the perspective of relating algorithms with dataset 
properties. 

PACS numbers: 89.75.Hc, 87.23. Ge, 05.70.Ln 



I. INTRODUCTION 

Favored by increasing information, people can enjoy 
an abundant life, however, people are also brought into 
a quandary decision of getting what they actually pre- 
fer. For example, how to select a satisfactory dress from 
various dress brands, or get an interesting book to read 
from the book sea [l(. As a powerful tool, recommenda- 
tion engine emerges to help people out of the overloaded 
information @ ■ With an inquiry of personal recommen- 
dation, developing efficient recommendation methods has 
become one of the central scientific programs. 

A great many algorithms have been proposed, and have 
led to a considerable progress, such as the collabora- 
tive filtering (CF) algorithm 0, H[, the content based 
algorithms 5], and the relevant extensive studies 
[l3j . Recently, favored by the fruitful achievements of 
complexity theory, complex network based recommen- 
dation algorithms have been proposed, which directs a 
promising way for the personal recommendation [l4l - |25j . 
Meanwhile, concepts from traditional physical domain 
have been introduced into the algorithm design, e.g., the 
thought of mass diffusion [l5|, [lj| and heat conducting 
[Til fioj l , which greatly promotes recommendation accu- 
racy and diversity. 

Most previous studies can be classified into two cat- 
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egories, i.e., the pure algorithms and the hybrid algo- 
rithms. The pure method refers to a single algorithm 
without any algorithm combination, such as the standard 
CF method [3|, whereas the hybrid algorithm refers to 
the algorithm which combines different pure algorithms, 
such as the collaborative filtering and the content hy- 
brid method [f|, the heat conducting and the probabil- 
ity spreading hybrid method (HHP) [191, and their vari- 
ants [26|, l27[ . Considering the well accepted evaluators of 
personalized recommendation, i.e., the recommendation 
accuracy and the diversity, the pure methods generally 
either perform excellently in the accuracy but show par 
performance in the diversity, or vice versa. For instance, 
the probability spreading (PBS) method [l5[ shows a 
great advantage in recommendation accuracy but a less 
outstanding performance in diversity, whereas the heat 
conducting (HTS) method 14{ greatly improves the rec- 
ommendation diversity but at the cost of the accuracy. 
Therefore, different hybrid methods have been proposed 
in order to improve the performance of both accuracy 
and diversity. The hybrid algorithm relates different pure 
methods via some function form, with the recommenda- 
tion performance usually controlled by the value of the 
hybridization parameter. Generally speaking, the hybrid 
method indeed performs better in the both aspects of 
the recommendation accuracy and the diversity than the 
pure method at some optimal hybridization parameter. 

However, how to find out the optimal hybridization 
parameter still remains controversial. For most hybrid 
methods, the optimal hybridization parameters obtained 
from different evaluators are different. By far, most algo- 
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rithms take a one-elevator optimal parameter selection, 
namely, choosing the optimal value according to the rec- 
ommendation performance of one evaluator, e.g., the rec- 
ommendation accuracy. However, without bias, differ- 
ent recommendation focus might prefer different evalu- 
ator performance. Consequently, a challenging question 
emerges: which evaluator should be taken as a common 
basis of optimization? Even though the recommendation 
accuracy is widely accepted to be the most important 
proxy in personalized recommendation, the cold start 
problem or the recommendation diversity also raises a 
central interest [HI [2^, [29| . The cold start problem refers 
to how to recommend the new item or recommend the 
interesting item to new users due to the lackness of activ- 
ity records. The diversity and novelty also significantly 
mark the vitality of a system. Explicitly, one can hardly 
find out the same value of the optimal hybridization pa- 
rameter according to different recommendation focal pur- 
poses. Moreover, even when evaluating the recommen- 
dation accuracy, different indicators might correspond to 
different optimal hybridization parameter value. For ex- 
ample, the ranking score [l5[ and the precision [3(| are 
both indicators which are used to evaluate the recom- 
mendation accuracy. However, the optimal parameters 
obtained by the ranking score and the pre cision are not 
usually consistent for a hybrid method [261 ] . 

Motivated by the explicit dilemma to choose a proper 
parameter for hybrid algorithms, in the present paper, 
we propose a data characteristic based algorithm (DCB) 
by finding out the possible correlation between the hy- 
bridization parameter and the data characteristic repre- 
sented by item degrees. With this implementation, in- 
stead of using only one evaluator as the basis of optimal 
hybridization parameter selection, the optimal parame- 
ter is adaptively assigned for each specific individual item 
according to the correlation between the algorithm and 
the dataset property. By testing our algorithm on three 
datasets, our algorithm shows a great promotional effect 
on the cold start problem and the recommendation di- 
versity, while simultaneously exhibits a high recommen- 
dation accuracy. 

The remainder of this paper is organized as follows. 
In the next section, we detail the bipartite network and 
the investigated algorithms of the recommendation sys- 
tem. Some popular indicators to evaluate the recommen- 
dation algorithm performance are introduced in Section 
III. Then, we compare the results of the present algo- 
rithm with a highly accurate pure algorithm, a hybrid 
method with both high accuracy and high diversity, and 
even an improved hybrid method which is highly efficient 
on cold start problem. Finally comes to the conclusion. 



the item set includes n items O = {o\ , o%, o a , o n . If 
an item o a is collected by a user Ui, then add a link be- 
tween them. The adjacent matrix which links the users 
and the items is A = {ai a }. If the item o a is collected by 
the user Ui, then at a = 1, otherwise, ai a — 0. The degree 
of an item is denoted as the number of links owned by 
the item. We assume an item is popular if the item has 
a big degree, otherwise, the item is cold. The task of a 
recommendation algorithm is to provide a user a rank- 
ing list of items that the user does not collect, and then 
recommend the items with higher rankings for the user. 

In the following algorithms, a so-called "resource" is 
introduced to items. If we label the initial level of re- 
source by a vector f — [f{ , ], the final resource 
of the item / = [/{,...,/*] is obtained according to a 
resource redistribution process described by a transfor- 
mation form, 

/ = Wf , (1) 

where W is the resource reallocation matrix. By rank- 
ing the level of the final resources, the items with higher 
resources will be recommended to users. Therefore, how 
to redistribute the resources plays a key role in the rec- 
ommendation process. 

The mass-diffusion based algorithm, refering to the 
PBS, is reported as a highly accurate method. The PBS 
is actually a three-step random walk process. The item 
firstly distributes the resource to its neighboring users 
with an equal probability, while the user again redis- 
tribute its total level of resource to its neighboring items. 
The item then obtains its final level of resource by sum- 
ming up all the resources from its neighboring users. The 
resource transformation matrix the PBS is as, 



*C = J_£«^ (2) 

3 = 1 J 

where kp is the degree of item op. The PBS achieves 
a high recommendation accuracy for assigning more re- 
sources on popular items, however, potentially puts the 
recommendation diversity at risk. 

By incorporating heat-conducting analogous process, 
the HTS method is proposed with a similar random-walk 
resource redistribution process. Firstly, the user receives 
an average level resource from its neighboring items, and 
then the item again gets a feedback of the average re- 
source from its neighboring users. The transformation 
matrix then reads, 



II. ALGORITHMS 

A recommendation system can be described by a bipar- 
tite network composed of a user set and an item set. The 
user set includes m users U — {u\,U2, ...iti, ...,u m }, and 



^5 = fE^, (3) 

where k a is the degree of item o Q , and kj is the degree of 
the user Uj. Differently from the PBS, the HTS assigns 
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more resources on cold items, and therefore shows a good 
performance in recommendation diversity, but at the cost 
of the recommendation accuracy. 

To achieve a high accuracy and diversity of recommen- 
dation, a hybrid method (HHP) is proposed [19(, by el- 
egantly combining the heat conduction and the mass- 
diffusion method as, 



W H+P = a a] a /3j 



/.I 

hi. 



K P = 1 



(4) 



where A € [0, 1]. When tuning the hybridization param- 
eter A to a suitable value, the HHP method outperforms 
in both the recommendation accuracy and the diversity. 

Based on the HHP method, an improved hybrid 
method (OHHP) is proposed [3l|, focusing on resolving 
the cold-start problem. In the OHHP, individual item 
degrees are incorporated into the hybridization parame- 
ter formula of the original HHP, where the hybridization 
parameter reads, 



(5) 



cold items, whereas keeps a high recommendation accu- 
racy of the overall and the popular items. 

Compared with pure methods of the PBS and the 
HTS, the hybrid methods show a great advantage in 
both the recommendation accuracy and diversity, how- 
ever, they suffer from the hybridization parameter selec- 
tion for different recommendation focal purposes. The 
explicit dilemma inspires us to find out an efficient hy- 
bridization parameter adaption procedure. In the OHHP, 
the cold start problem is well resolved by considering 
the item degree into the parameter selection procedure, 
which indicates a promising way to design the data char- 
acteristic based algorithms. The degree provides a simple 
way to describe the dataset property. If a universal rela- 
tion can be revealed between the hybridization parameter 
and the average degree by some function form A ~ f((k)), 
the parameter A then can be adaptively assigned. How- 
ever, generally, for different recommendation list length 
L, the relation function between A and the degree is dif- 
ferent. As shown in Fig. 1, for all three datasets, i.e., the 
RYM, the Netflix, the MovieLens (Details of the three 
datasets will be introduced in Section IV), the average 
degree of the items on the hybridization parameter A of 
the HHP exhibits a different behavior for different recom- 
mendation list length L, which suggests that one should 
not provide a uniform relational function between the 
average degree and the hybridization parameter A. 




X 



FIG. 1: The average degree (k) on the hybridization param- 
eter A is displayed. 

where kp is the degree of the examined object, k max is 
the maximum degree of all the objects, and 7 is a tuned 
parameter. The OHHP actually optimizes the probabil- 
ity spreading factor in the transformation matrix of Eq. 
(4) according to the individual item degree level, there- 
fore it greatly enhances the recommendation accuracy of 
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FIG. 2: The rescaled hybridization parameter A vs. the 
rescaled average degree (k) is displayed. 

In order to obtain a scaling behavior of the relation be- 
tween (k) and A independent of the recommendation list 
length L, we analytically investigate the recommendation 
bias for the hybrid algorithm. On average, the proba- 
bility that a target user i collects an item j3 is directly 
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proportional to /3's degree, kp, that is to say, dtp oc 
where n is the number of items. Based on the theoretical 
analysis in [3l[, we hypothesize that the probability of 
dip is independent of other links, and then the expected 
score of each user- item link, f ia , can be calculated as 



oc k^J^kp 

oc k^Jf(k)k 2 - x dk 



(6) 



where f(k) is the probability distribution function of the 
item degrees. As suggested in Ref. [31], f(k) obeys a 
power-law distribution, i.e., f(k) oc k~ v . Then, one can 
calculate /j Q as, 



fia oc k^ J k^ ^ v dk 



(7) 



where k max and k m i n are respectively the maximum and 
the minimum of the item degrees, with k max 3> k m i n . 
Assuming M = v, min , one then obtains, 



where kp is the degree of the examined item (3, The 
hybridization parameter of the item /3 is then adaptively 
assigned by, 



ae 



ce 



(12) 



III. METRICS 



Recommendation accuracy is with no doubt one of the 
most important indicators to evaluate the performance 
of an algorithm. As an adjunct to accuracy, recommen- 
dation diversity is addressed to be an important elevator 
to quantify the personal recommendation. In our study, 
we take the ranking score and the precision to quantify 
the recommendation accuracy, the inter-diversity and the 
inner-diversity to quantify the recommendation diversity. 
Moreover, to specifically investigate the recommenda- 
tion accuracy of cold items, we further study an item- 
dependent ranking score and an item-dependent preci- 
sion. 



a (te; 



*(1-M)fc££ 



(8) 



Instructed by the theoretical analysis in Eq. (8), in our 
study, we rescale A ~ f((k)) as A ~ f((k}) by normalizing 
the A dependent average degree (k(X)) to (k(X)) as, 



(fc(A)) 



(fc(A)) 



A' 1 ,, 



(9) 



where (...) takes an average on all the items in the rec- 
ommendation list, k max = max{{k{\)) p^L ^efQ,!]} an( l 
kmin = min{(k(X)) peL.\e[o.i]}- The rescaled procedure 
assures (k) £ [0,1]. The A on the rescaled average de- 
gree (k) for the recommendation list length L = 10, 20, 
30, 40 and 50 is shown in Fig. 2, where a satisfactory 
data collapse independently from thee recommendation 
list length L is observed for the RYM, the Netflix and 
the MovieLens. We fit the rescaled A ~ f((k)) with a 
combined exponential function form which reads, 



\ = ae b ® + ce d < k \ 



(10) 



The corresponding coefficient is (a, b, c, d) = 
(0.04,3.31,-0.04,-12.28) for the RYM, (a,b,c,d) = 
(0.03,2.25,1.75 x 10" 9 , 19.78) for the Netflix, and 
(a,b,c,d) = (0.03,2.48,4.95 x lO" 7 , 14.05) for the 
MovieLens. For a specific individual item /3, we replace 
the rescaled average degree (k) in Eq. (10) with a 
normalized individual item degree, 
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A. RECOMMENDATION ACCURACY 

RANKING SCORE (r) 15] .-The ranking score r m 
for the item o„ to the user m is defined as, 



Pa 



n - ki 



(13) 



where n is the number of all items, ki is the degree of 
the user Uj, and p a is the position of the recommended 
item o a located in all the uncollected items of the user 
Ui. Generally speaking, users collect the items which 
they prefer. Namely, for a user itj,if the deleted link with 
an item o a is in a higher rank of it^s all deleted links, 
the algorithm is more accurate. The average ranking 
score r is then defined as the average of r a i over all the 
deleted links. The smaller the r, the more accurate the 
algorithm. 

To focus on the recommendation accuracy of cold 
items, we define an item-degree dependent ranking score 
rfc as the average ranking score over items with the same 
value of degrees [32]. 

PRECSION (P) [13 .-The recommendation precision 
P is defined as 



P = 



1 J2 t QiL 



(14) 



where is the number of the user u'iS deleted links 
contained in the top L recommended item list. The larger 
the P, the higher accuracy the algorithm. 

Similarly, to better understand the recommendation 
accuracy of the cold items, we define an item-degree de- 
pendent precision by, 
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P k = (15) 
m L 

where q* L is the number of the user u-s deleted links for 
items with degree k in the top L recommended item list. 

B. RECOMMENDATION DIVERSITY 

INTER DIVERSITY (D Inter ).-D Inter quantifies 
the difference between different users recommendation 
list by 

Drnter = 7 ~ TT £ £ (1 ~ (16) 

mm — 1 L 

y I i=i 3 =i+i 

where (Lj H^i) i s the number of common recommended 
items for user Ui and Uj in the top L recommendation 
list. Generally, the greater the Di nter , the more personal 
the recommendation for different users, and vice versa. 

INNER DIVERSITY (D Inner ).-D Inner calculates 
the difference within a specific user recommendation list 
by 

1 m 

Dlnner = mUL - 1) ^ ^ 1 ~ Sa ^' ^ 

<■ > i=l 

where S a p — —j= = Yl™L 1 a a iapj is the cosine similarity 

between items o a and op in a single user's top L rec- 
ommended item list. Generally, the greater the Di nner , 
the higher diversification of the recommendation list for 
a specific user, and vice versa. Thus, a large Dj nner 
provides an evidence that the algorithm can potentially 
enlarge visions of each single user by recommending less 
similar items. 



IV. DATA 

We test the algorithm performance on three datasets, 
the RYM, the Netfilx and the MovieLens. The RYM 
is a music rating system with a ten-level rating, and 
the Netflix and the MovieLens are movie rating systems 
with a five-level rating. The RYM dataset is downloaded 
from the music rating web site RateYourMusic.com, the 
Netflix dataset is obtained by randomly selecting from 
the huge dataset of the Netflix Prize, and the Movie- 
Lens is downloaded from the web site of GroupLens Re- 
search. Due to the different level of ratings, we perform 
a coarse-graining mapping to a unary form for all the 
three datasets. If the rating is no less than three for 
the Netflix and the MovieLens, and six for the RYM, we 
argue that the item is collected by a user. The Netflix 
contains 10000 users, 6000 items, and 701947 links, and 



the MovieLens contains 943 users, 1682 items, and 82520 
links. When dealing with the RYM, several items are 
found to have particularly large degrees, which are much 
higher than the rest of items. We then remove about 10 
items with a huge number of degrees, and the RYM then 
contains 33786 users, 5381 items and 613387 links. The 
sparsity of the datasets, defined as the number of links 
proportional to the total number of the user-item links, 
is 3.37%, 1.17%, and 5.20% for the RYM, the Netflix, and 
the MovieLens, respectively. 

We divide a dataset into two subsets of the training set 
and the test set. We randomly cut the 10% links as the 
test set, and remain the rest 90% links as the training 
set. We utilize the training set to make predictions for 
users, and the test set to test the algorithm performance. 

V. RESULTS 

To test the efficiency of the DCB algorithm, we com- 
pare the performance of the DCB with the PBS, the 
HHP, and the OHHP. The PBS is a pure method with 
a high accuracy, and the HHP is a hybrid method which 
resolves the dilemma between the recommendation ac- 
curacy and the diversity, and the OHHP is an improved 
hybrid method which further resolves the cold start prob- 
lem. A summary of the performance of the PBS, the 
HHP, the OHHP and the DCB is presented in Tbl. 1. 

To detect how much the DCB outperforms the other 
three algorithms, we define a percentage improvement 
Salg by, 

Salg — (Qalg — Qdcb)/Qdcb, (18) 

where the subhead ALG refers to the investigated algo- 
rithm, and the Qalg is the value of the metrics, i.e., the 
value of the r, r k < w , P, P k <w, D Inter and D Inner . The 
percentage improvements Salg of the PBS, the HHP, 
and the OHHP against the DCB arc summarized in Tbl. 
2. 

From Tbl. 1 and Tbl. 2, for all the three datasets, 
the DCB shows a great advantage in the recommenda- 
tion accuracy of the low-degree objects, as well as the 
inter-diversity and the inner-diversity, when simultane- 
ously keeps a high recommendation accuracy. 

For the recommendation accuracy, we focus on the 
overall recommendation accuracy and the recommenda- 
tion accuracy of the cold items. Compared with the 
highly accurate PBS method, the DCB outperforms the 
PBS for all the metrics. Taken the RYM as an exam- 
ple, the DCB outperforms the PBS as much as 164.6%, 
85.0% for the recommendation accuracy of the low-degree 
objects rfc<io, Pk<io, 37.0%, 10.0% for the overall rec- 
ommendation accuracy r and P, and 5.9%, 3.5% for 
the inter-diversity Di nte r and the inner-diversity Di nner . 
Similar outstanding performance of the DCB against the 
PBS is also observed for the Netflix and the MovieLens. 
It indicates the DCB is highly accurate. 
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TABLE I: The overall ranking score r, the item-degree dependent ranking score r fc <i , the overall precision P, the item-degree 
dependent precision Pfc<io, the inter-diversity D inter , and the inner-diversity D inner of the PBS, the HHP, the OHHP, and the 
DCB methods are shown for the RYM, the Netflix, and the MovteLens, with L — 50. 
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DCB 


0.046 


0.181 


0.040 
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0.946 


0.882 


PBS 
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0.472 
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0.000 


0.435 


0.636 


HHP 


0.044 


0.428 
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0.00004 


0.595 


0.672 


NetflMRP 0.043 


0.345 
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DCB 


0.046 


0.343 0.059 
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0.784 


0.807 


PBS 


0.106 
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0.075 


0.000 


0.618 


0.645 


HHP 


0.085 


0.427 


0.087 


0.00020 


0.836 


0.712 


M0-6MS& 


0.085 


0.385 


0.085 


0.00037 


0.813 


0.703 


DCB 


0.091 


0.345 


0.081 


0.00057 


0.883 


0.764 



TABLE II: The percentage improvement of the PBS, the HHP, and OHHP against the DCB in the overall ranking score r, 
the item-degree dependent ranking score rfc<io, the overall precision P, the item-degree dependent precision Pfc<io, the inter- 
diversity Dinter, and the inner-diversity Dinner are shown for the RYM, the Netflix, and the MovteLens, with L = 50. To guide 
the eyes, if the indicator of the DCB outperforms the other methods, we show the value of the improvement percentage as a 
positive value, otherwise, as a negative value. 
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0.0% 
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0.0% 
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0.0% 
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1.8% 1.4% 


Netflix-BS 
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37.6% 


6.8% 


100.0% 


44.5% 21.2% 
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24.1% 16.7% 
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26.7% 15.0% 
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23.8% 
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-6.6% 


11.6% 
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The HHP is an excellent algorithm in both the ac- 
curacy and the diversity, at the optimal hybridization 
parameter. Compared with the HHP at the optimal hy- 
bridization parameter defined by the ranking score, the 
DCB presents a close or very little lower overall recom- 
mendation accuracy, but an apparent great advantage in 
the recommendation accuracy of the cold items. For the 
RYM, the DCB outperforms the HHP as much as 84.5% 
and 60.0% for the recommendation accuracy of the low- 
degree items rfe<i , Pfc<io, but shows a close value of 
0.0% and —5.0% in the overall recommendation accu- 
racy r and P. For the Netflix and the MovteLens, the 
DCB shows a little loss of the overall recommendation 
accuracy r and P, with —4.4%, —5.1% for the Netflix, 
and —6.6%, —7.4% for the MovieLens. However, the 
improvement percentage of the rfe<io and the Pk<w is 
24.8%, 63.6% for the Netflix, and 23.8%, 64.9% for the 
MovieLens. The improvement percentage of the recom- 
mendation accuracy for the cold items is much higher 
than the loss in the overall recommendation accuracy. 
It further suggests that the DCB is outstanding in the 
cold start problem, while keeping a high recommenda- 



tion accuracy. Moreover, we find that the DCB outper- 
forms the HHP in both the inter-diversity D^ter and the 
inner-diversity Dinner for all the three datasets. The im- 
provement percentage of the inter-diversity Dinter and 
the inner-diversity D inn er is computed to be 1.1%, 2.2% 
for the RYM, and 24.1%, 16.7% for the Netflix, and 5.3%, 
6.8% for the MovieLens. 

The OHHP method is apparently advantageous in the 
cold start problem. Compared with the OHHP at the 
optimal hybridization parameter defined by the ranking 
score, the DCB method further greatly improves the rec- 
ommendation accuracy of the cold items. For the RYM, 
the DCB outperforms the OHHP as much as 12.7%, 
20.0% for the recommendation accuracy of the low-degree 
objects rk<io, Pk<w, but shows a very near recommen- 
dation accuracy of the overall items. Again, similar be- 
havior is found for the Netflix and the MovieLens. 

To further understand the recommendation efficiency 
on the cold items, we show the degree distribution p(k) 
of the items in the top L — 50 recommendation list in 
Fig. 3. It is observed that the probability of the cold 
items decreases as the order of the DCB, the OHHP, the 



7 



HHP, and the PBS, which indicates that the DCB indeed 
greatly contributes to the recommendation efficiency of 
the cold items. 



0.000 
0.16 



0.00 
0.06 




— •— 


DCB 


— ■ — 


OHHP 


— ▼— 


HHP 


—A — 


PBS 



(b)Netflix 



x=x~;:£i Mm 




10 k 10 



FIG. 3: The degree distribution p(k) of the items in the top 
L = 50 recommendation list is displayed. 
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FIG. 4: The rescaled hybridization parameter A on the 
rescaled average degree (k) is displayed for the DCB and the 
OHHP. 

Intuitively, the improvement of the recommendation 
accuracy of the cold items might corresponds to the im- 
provement of the recommendation diversity. However, 



by the comparison between the OHHP and the original 
HHP, we find that the inter-diversity of the OHHP is a lit- 
tle lower than that of the HHP for all the three datasets, 
and the inner-diversity of the RYM and the Netflix is 
a little higher than the HHP, but of the MovieLens is 
also lower than that of the HHP. It suggests that the 
OHHP does not show apparent advantages in the rec- 
ommendation diversity, though it greatly improves the 
recommendation accuracy of the cold items. 

To better understand the observed phenomena, we 
show the rescaled hybridization parameter A on the 
rescaled average degree (k) of the OHHP and the DCB 
in Fig. 4, where the curve of the DCB is obtained from 
the empirical study. It is observed that the A on the (k) 
of the OHHP deviates the empirical curve of the DCB, 
which can partly explain why the OHHP method unilat- 
erally improves the recommendation accuracy of the cold- 
items, but not simultaneously enhances the recommen- 
dation diversity. Compared with the OHHP, the DCB 
not only further improves the recommendation accuracy 
of the cold items, but also elevates the recommendation 
diversity. 

Further investigation of the inter-diversity Di nter on 
the recommendation list length L suggests that, for all 
the four methods, the inter-diversity decreases with the 
recommendation list length L, as shown in Fig. 5. It 
is reasonable since the difference between different users 
recommendation list will decreases with the augment of 
the recommendation list length L. Compared with the 
pure method of the PBS, the hybrid methods of the HHP 
and the OHHP show a much slower decay of the inter- 
diversity. Especially, the DCB exhibits a much higher 
value and a much slower decay of the inter-diversity for 
the overall range of the recommendation list length L. It 
indicates that the recommendation diversity of the DCB 
is not only higher but also more stable than the PBS, 
the HHP, and the OHHP, with the recommendation list 
length. 

The inner-diversity Di nner on the recommendation list 
length L is shown in Fig. 6. For the RYM, it is observed 
that the Di nner increases with L for all the four algo- 
rithms. However, for the Netflix and the MovieLens, the 
Dinner increases with L only for the PBS, the HHP, the 
OHHP, but exhibits a very stable and high value for the 
DCB. It further suggests that the DCB provides a highly 
and steadily diverse recommendation. 

Taken together, while not requiring any procedure of 
the optimal hybridization parameter selection according 
to a specific evaluator, but adaptively assigning the op- 
timal parameter according to the relational function be- 
tween the algorithm and the item degree, the DCB re- 
markably outperforms the PBS, the HHP, and the OHHP 
in the recommendation accuracy of cold items, as well as 
the recommendation diversity, and simultaneously keeps 
a high overall recommendation accuracy. 

The dilemma existing most in common in hybrid algo- 
rithms is how to choose proper optimal hybridization pa- 
rameter according to different recommendation focuses. 
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FIG. 5: The inter-diversity Di nte r on the recommendation list 
length L is displayed. 
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FIG. 6: The inner- diversity Dinner on the recommendation 
list length L is displayed. 



It is out of question that recommendation accuracy is one 
of the most important evaluators of the algorithm perfor- 
mance. However, even when evaluating recommendation 
accuracy, different indicators might take different values. 
By relating the data property to the algorithm, we re- 
solve the explicit dilemma of the hybridization parameter 
selection for the complex contradiction among different 



recommendation focuses. 

Moreover, the cold start problem is a long-standing 
challenging in traditional recommendation system, since 
it is difficult for users to be aware of the cold items with 
insufficient accessorial information [33l l34j|. However, for 
most systems, the cold items occupy a big proportion. 
In the RYM, the Netflix, and the Movielens, the cold 
items whose degrees are no more than 1 are as much as 
24.56%, 50.62%, and 41.26%. Developing efficient infor- 
mation filtering techniques is essentially required to solve 
the cold start problem. Integrating the tag information 
has been taken as an efficient way to make prediction for 
cold items |28j , which however increases the system com- 
plexity. The DCB greatly improves the recommendation 
accuracy of the cold items, whereas keeps a high overall 
accuracy, from the perspective of constructing the pos- 
sible correlation between the algorithm design and the 
dataset characteristic. 

Furthermore, most past studies overwhelmingly em- 
phasize the recommendation accuracy, but underestimate 
the importance of diversity. In fact, diversity can well 
evaluate the personal recommendation. However, rec- 
ommendation accuracy and diversity are an apparent 
dilemma pair in traditional information filtering system. 
Typical examples are the PBS and the HTS algorithms, 
where the PBS is more accurate but less diverse, whereas 
the HTS is more diverse but less accurate. The DCB 
method shows an excellent recommendation diversity, as 
well as a high recommendation accuracy, by finding out a 
recommendation list length independent relational func- 
tion between the hybridization parameter and the item 
degrees. 



VI. CONCLUSION 

In this article, we propose a data characteristic based 
recommendation algorithm by finding out the relational 
function between the hybridization parameter and the 
item degrees. We use a combined exponential function 
form A = ae b ^ + ce d ^ to fit the relation curve, which 
is independent of the recommendation list length. With 
this implementation, hybridization parameters are adap- 
tively obtained according to the specific individual item 
degree. Experimental results show that, the proposed 
method significantly promotes the performance of the 
long-standing cold start problem, as well as the recom- 
mendation diversity, while simultaneously keeps a high 
recommendation accuracy, without requiring any addi- 
tional accessory information. 

Previous studies show that, most pure methods per- 
form excellently in either recommendation accuracy or 
diversity, whereas hybrid methods generally outperform 
in both accuracy and diversity at an optimal hybridiza- 
tion parameter. However, how to obtain the real op- 
timal hybridization parameter for different recommenda- 
tion focuses still remains controversial. In this article, we 
have shown that, the dilemma of seeking for the optimal 
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parameter of hybrid methods can be elegantly resolved 
by constructing the relational function between the algo- 
rithm and the dataset characteristic. 

Furthermore, the cold start problem is a long-standing 
challenge in traditional recommendation systems. Due to 
very little accessorial information of the cold items, it is 
hard for users to be aware of these items. Utilizing tag in- 
formation has been taken as an efficient way to solve the 
cold start problem, which however increases the system 
complexity. The manifested DCB method shows a great 
promotional effect on the cold item recommendation ac- 



curacy, as well as the diversity. Our present work might 
shed some new light on the personal recommendation. 
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